Add link to MAX_RETRY allocation explain message #113657

matthewabbott · 2024-09-27T01:43:22Z

Adds maximum number of retries exceeded reference link to the max_retry allocation explanations string.

Adds more detail to documentation page describing that this was done to protect the cluster, but the real cause of the issue may now be gone and so allocation can be retried.

Also adds POST to the example _cluster/reroute API in the explanation because some customers would use GET and be confused why it didn’t work.

github-actions · 2024-09-27T01:43:31Z

Documentation preview:

✨ Changed pages

DaveCTurner

Some small comments. Also you need to run ./gradlew precommit and fix up the issues.

docs/reference/cluster/allocation-explain.asciidoc

DaveCTurner · 2024-09-27T07:39:35Z

docs/reference/cluster/allocation-explain.asciidoc

+If no other `no` decisions are present, then the transient allocation issue
+that caused these failures has most likely been resolved, and you can use the
+<<cluster-reroute,the cluster reroute API>> to retry allocation.


I think this will be confusing, there are normally always some no decisions e.g. for nodes in the wrong data tier.

Also I'd rather we used the imperative voice: "use the reroute API" rather than just suggesting "you can ...".

Finally there's a duplicate the (one inside the link and one outside).

Brainstorming, I might say

Elasticsearch queues shard allocation retries in batches. If there are long running or a high quantity of shard recoveries occurring within the cluster, this process may time out for some shards resulting in MAX_RETRY. This surfaces infrequently but is expected to prevent infinite retries which may impact cluster performance. When encountered, run <<cluster-reroute,the cluster reroute API>> to retry allocation.

Changed this to

Elasticsearch queues shard allocation retries in batches. If there are long-running shard
recoveries or a high quantity of shard recoveries occurring within the cluster, this
process may time out for some shards, resulting in max_retry. This surfaces infrequently
but is expected to prevent infinite retries which may impact cluster performance. When
encountered, run the <<cluster-reroute,cluster reroute>> API to retry allocation.

Which is basically identical but I tweaked the wording on the second sentence because I thought it sounded a bit clearer that way, and also moved 'the' and 'API' out of the link per suggestion from @DaveCTurner

Thanks!

server/src/main/resources/org/elasticsearch/common/reference-docs-links.json

docs/reference/cluster/allocation-explain.asciidoc

matthewabbott · 2024-10-04T20:26:45Z

Yikes about all those commits. I did not rebase this correctly.

… explanation.

…ation_max_retries_doc

elasticsearchmachine · 2024-10-14T17:03:44Z

Pinging @elastic/es-docs (Team:Docs)

elasticsearchmachine · 2024-10-14T17:03:45Z

Pinging @elastic/es-distributed (Team:Distributed)

docs/reference/cluster/allocation-explain.asciidoc

DaveCTurner · 2024-10-15T09:56:50Z

docs/reference/cluster/allocation-explain.asciidoc

-the <<cluster-reroute,cluster reroute>> API to retry allocation.
+Elasticsearch queues shard allocation retries in batches. If there are long-running shard
+recoveries or a high quantity of shard recoveries occurring within the cluster, this
+process may time out for some shards, resulting in `max_retry`. This surfaces infrequently


This isn't true, there's no timeout in play here. You need to get 5 genuine failures in a row before you see this.

I was thinking of changing this to

When Elasticsearch is unable to allocate a shard, it will attempt to retry allocation up to the maximum number of retries allowed. After this, Elasticsearch will stop attempting to allocate the shard in order to prevent infinite retries which may impact cluster performance. Run the <<cluster-reroute,cluster reroute>> API to retry allocation, which will allocate the shard if the issue preventing allocation has been resolved.

Are there any tweaks you’d like to make?/Does that seem reasonable?

...n/generated/org/elasticsearch/xpack/esql/expression/function/scalar/math/HypotEvaluator.java

…ation_max_retries_doc

DaveCTurner

LGTM

Backport of elastic#113657 to `8.x`

Backport of #113657 to `8.x` Co-authored-by: matthewabbott <[email protected]>

elasticsearchmachine added v9.0.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Sep 27, 2024

DaveCTurner reviewed Sep 27, 2024

View reviewed changes

stefnestor reviewed Sep 27, 2024

View reviewed changes

docs/reference/cluster/allocation-explain.asciidoc Outdated Show resolved Hide resolved

Added max_retry docs explanation and linked to the docs in allocation…

74905ea

… explanation.

matthewabbott force-pushed the matthewabbott_allocation_max_retries_doc branch from 17f738f to 74905ea Compare October 4, 2024 20:38

matthewabbott added 7 commits October 4, 2024 13:45

ran spotlessapply

0c4d80e

Merge branch 'main' into matthewabbott_allocation_max_retries_doc

8702053

Merge branch 'main' into matthewabbott_allocation_max_retries_doc

a8ea596

fixed max retry doc link

7ab3606

Merge remote-tracking branch 'upstream/main' into matthewabbott_alloc…

ce153c9

…ation_max_retries_doc

fix retry explanation message string

95ddbc8

ran spotlessapply/precommit

7d77f8d

matthewabbott marked this pull request as ready for review October 14, 2024 17:03

DaveCTurner requested changes Oct 15, 2024

View reviewed changes

matthewabbott added 3 commits October 16, 2024 15:51

Merge remote-tracking branch 'upstream/main' into matthewabbott_alloc…

8a6f70e

…ation_max_retries_doc

Tweak MAX_RETRY docs message and API example

b56252a

Merge branch 'main' into matthewabbott_allocation_max_retries_doc

f20ab50

DaveCTurner added auto-backport Automatically create backport pull requests when merged v8.17.0 labels Oct 18, 2024

DaveCTurner approved these changes Oct 18, 2024

View reviewed changes

DaveCTurner merged commit 9a8de1c into elastic:main Oct 18, 2024
15 checks passed

DaveCTurner pushed a commit to DaveCTurner/elasticsearch that referenced this pull request Oct 18, 2024

Add link to MAX_RETRY allocation explain docs

923b290

Backport of elastic#113657 to `8.x`

DaveCTurner mentioned this pull request Oct 18, 2024

Add link to MAX_RETRY allocation explain docs #115099

Merged

lkts pushed a commit to lkts/elasticsearch that referenced this pull request Oct 18, 2024

Add link to MAX_RETRY allocation explain docs (elastic#113657)

c742215

stefnestor mentioned this pull request Oct 23, 2024

Add link to allocation explain setting conflict message #115484

Open

georgewallace pushed a commit to georgewallace/elasticsearch that referenced this pull request Oct 25, 2024

Add link to MAX_RETRY allocation explain docs (elastic#113657)

cc4a604

elasticsearchmachine pushed a commit that referenced this pull request Oct 28, 2024

Add link to MAX_RETRY allocation explain docs (#115099)

b88e9a6

Backport of #113657 to `8.x` Co-authored-by: matthewabbott <[email protected]>

jfreden pushed a commit to jfreden/elasticsearch that referenced this pull request Nov 4, 2024

Add link to MAX_RETRY allocation explain docs (elastic#113657)

aff46f2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add link to MAX_RETRY allocation explain message #113657

Add link to MAX_RETRY allocation explain message #113657

matthewabbott commented Sep 27, 2024

github-actions bot commented Sep 27, 2024

DaveCTurner left a comment

DaveCTurner Sep 27, 2024

stefnestor Sep 27, 2024

matthewabbott Oct 14, 2024

matthewabbott commented Oct 4, 2024

elasticsearchmachine commented Oct 14, 2024

elasticsearchmachine commented Oct 14, 2024

DaveCTurner Oct 15, 2024

matthewabbott Oct 17, 2024

DaveCTurner left a comment •

edited

Loading

Add link to MAX_RETRY allocation explain message #113657

Add link to MAX_RETRY allocation explain message #113657

Conversation

matthewabbott commented Sep 27, 2024

github-actions bot commented Sep 27, 2024

DaveCTurner left a comment

Choose a reason for hiding this comment

DaveCTurner Sep 27, 2024

Choose a reason for hiding this comment

stefnestor Sep 27, 2024

Choose a reason for hiding this comment

matthewabbott Oct 14, 2024

Choose a reason for hiding this comment

matthewabbott commented Oct 4, 2024

elasticsearchmachine commented Oct 14, 2024

elasticsearchmachine commented Oct 14, 2024

DaveCTurner Oct 15, 2024

Choose a reason for hiding this comment

matthewabbott Oct 17, 2024

Choose a reason for hiding this comment

DaveCTurner left a comment • edited Loading

Choose a reason for hiding this comment

DaveCTurner left a comment •

edited

Loading